Goto

Collaborating Authors

 co-expression network




Inferring independent sets of Gaussian variables after thresholding correlations

arXiv.org Machine Learning

We consider testing whether a set of Gaussian variables, selected from the data, is independent of the remaining variables. We assume that this set is selected via a very simple approach that is commonly used across scientific disciplines: we select a set of variables for which the correlation with all variables outside the set falls below some threshold. Unlike other settings in selective inference, failure to account for the selection step leads, in this setting, to excessively conservative (as opposed to anti-conservative) results. Our proposed test properly accounts for the fact that the set of variables is selected from the data, and thus is not overly conservative. To develop our test, we condition on the event that the selection resulted in the set of variables in question. To achieve computational tractability, we develop a new characterization of the conditioning event in terms of the canonical correlation between the groups of random variables. In simulation studies and in the analysis of gene co-expression networks, we show that our approach has much higher power than a ``naive'' approach that ignores the effect of selection.


BioData Mining

#artificialintelligence

The inference of biological networks is a highly relevant and challenging task in systems biology and integrative bioinformatics. Biological networks are graphs in which nodes represent genes or proteins, and a connection between them indicates some kind of biological relationship, e.g. regulatory or functional. The network inference is, in an essence, an attempt to reverse engineer the biological relationships from the high-throughput biological data [1]. Most biological network inference methods focus on the definition of gene regulatory networks, in which edges represent direct regulatory interactions between genes [2–4]. Far less effort has been put into the design of methods to build functional networks in which a connection indicates a functional relationship, e.g.


Differential gene co-expression networks via Bayesian biclustering models

arXiv.org Machine Learning

Identifying latent structure in large data matrices is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are locally co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes whose covariation may be observed in only a subset of the samples. Our biclustering method, BicMix, has desirable properties, including allowing overcomplete representations of the data, computational tractability, and jointly modeling unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios. Further, we develop a method to recover gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and recover a gene co-expression network that is differential across ER+ and ER- samples.


Characterization of differentially expressed genes using high-dimensional co-expression networks

arXiv.org Machine Learning

We present a technique to characterize differentially expressed genes in terms of their position in a high-dimensional co-expression network. The set-up of Gaussian graphical models is used to construct representations of the co-expression network in such a way that redundancy and the propagation of spurious information along the network are avoided. The proposed inference procedure is based on the minimization of the Bayesian Information Criterion (BIC) in the class of decomposable graphical models. This class of models can be used to represent complex relationships and has suitable properties that allow to make effective inference in problems with high degree of complexity (e.g. several thousands of genes) and small number of observations (e.g. 10-100) as typically occurs in high throughput gene expression studies. Taking advantage of the internal structure of decomposable graphical models, we construct a compact representation of the co-expression network that allows to identify the regions with high concentration of differentially expressed genes. It is argued that differentially expressed genes located in highly interconnected regions of the co-expression network are less informative than differentially expressed genes located in less interconnected regions. Based on that idea, a measure of uncertainty that resembles the notion of relative entropy is proposed. Our methods are illustrated with three publically available data sets on microarray experiments (the larger involving more than 50,000 genes and 64 patients) and a short simulation study.


Abstracting Complex Interaction Networks

AAAI Conferences

The exploration of complex interaction networks has attracted considerable interest in various fields, ranging from fundamental biology and medicine to statistical physics and information technology.  In -omics disciplines, significant progresses have been made in understanding the large-scale properties and the biological relevance of these interactions. Some properties such as scale-free distribution of nodes connectivity or centrality are aspects commonly described in such complex interaction systems. In many of these studies the analysis of network topology is complemented by a semantic analysis that may rely on  different labels associated to the interacting entities.   One of the bottleneck of these semantic analysis is that they  are computationally costly. In this paper we present a framework to explore abstraction of networks useful to speedup the computation  of ground network measures. Such abstraction mechanisms may be used to efficiently provide accurate approximations of ground network measures.